There is no community that feels the convergence of government policies and in-home culture more than women and children. We are interested in studying the population of mothers and infants due to their unique vulnerability, and their health outcomes which are disproportionately affected by various exposures and government policies. Our group members had interests in both environmental health and perinatal health, so we decided to assess the effect of environmental exposures on maternal and infant health outcomes. We choose to restrict our analysis of these exposures and outcomes to only California due to both data availability constraints, the state’s population and racial diversity, its comprehensive and transparent environmental regulations, and its high use of pesticides. California is the greatest user of pesticides in the US with over 85 million kg applied annually, an amount equivalent to roughly 30% of the cumulative active ingredients applied to US agriculture.
In 1986, California passed “The Safe Drinking Water and Toxic Enforcement Act of 1986” also known as Proposition 65. Proposition 65 requires businesses to provide warnings regarding significant exposures to chemicals that cause cancer, birth defects or other reproductive harm. By requiring that this information be provided, Proposition 65 enables Californians to make informed decisions about their exposures to these chemicals. California has a list of harmful chemicals as characterized by Proposition 65, which is updated at least once a year and includes over 900 chemicals. Proposition 65 has motivated businesses to eliminate or reduce toxic chemicals in numerous consumer products and has led to the safer reformulation of many products. The law has also been successful in educating the general public about exposures to toxic chemicals in consumer products, buildings, and the environment, which as a result created a demand and market reward for less-toxic products.
California is the most populous state in the United States (with roughly 12% of annual births) and is a very diverse state in regards to both demographics and landscape across its various counties, which lends to increased diversity in the state level data. Due to California’s diverse population, we were able to assess exposure to racism (recorded as race) as a potential confounder to maternal health outcomes. The inclusion of analysis and a discussion surrounding race and racism is critical when doing any research in the field of maternal health. The literature has no shortage of evidence pointing to disproportionately adverse health outcomes for Black mothers and babies in America. Infant mortality rates for America’s Black babies are more than twice the rate of white babies and they are more than three times as likely to die from complications related to low birth weight. U.S. maternal mortality rates for Black women are also three to four times higher than rates for white women. For this reason we decided to classify “the experience of racism” as a confounder in our analysis and to stratify by race to account for this confounding.
Our initial question was pretty broad: “What is the effect of pesticide use on Maternal and Child Health?” We then narrowed down our scope to include only data from California (inspired by our background research and related work). Over the course of the project, we defined our exposure as pesticide use (continuous variable measured in pounds). We also narrowed down our outcomes of interest to include: fertility, birth weight, and gestational age. We further narrowed our scope of counties to focus on those with high pesticide use and high agricultural activities due to the focus on these areas in the literature we reviewed (mentioned in the “related work” section). The counties we focused on were ranked in our data as the top 4 counties for highest pesticide use and they included: Fresno, Kern, Tulare, and San Joaquin (in that order). We chose to include Los Angeles as a comparison group for the exploratory analysis of the maternal and child health data because it is one of the most populated and most diverse counties in California, and this was of importance to us due to our interest in examining the maternal and infant health outcomes, stratified by race.
*See notes regarding scraping, cleaning, and wrangling methods at each respective code chunk
Pesticide use for California counties data was retrieved from the California Department of Pesticide Regulation- Pesticide use reporting program https://www.cdpr.ca.gov/docs/pur/purmain.htm
Maternal and Child Health outcome data was mainly obtained from the following three sources:
1, California Open Data Portal https://data.ca.gov/dataset/live-births-with-low-and-very-low-birthweight
CHHS Open Data https://data.chhs.ca.gov/dataset/preterm-and-very-preterm-live-births/resource/cff79e2d-6ecf-4158-9e4f-7078632220ee
Centers for Disease Control and Prevention (CDC) Natality Online Database on the Wide-ranging OnLine Data for Epidemiologic Research (WONDER) system Natality, 2007-2019 Request Centers for Disease Control and Prevention (CDC) Natality Online Database on the Wide-ranging OnLine Data for Epidemiologic Research (WONDER) system https://wonder.cdc.gov/natality-current.html
For each of the variables of interest (Fertility Rate, Gestational Age, and Birth Weight) we first visualized the variable across all counties over the years in a tile plot. We then viewed the trend in our counties of interest by filtering by county and plotting the variable on a line plot. These counties include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since Los Angeles is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use. Since confounding by race was of interest to us, we also analyzed the racial demographics across counties by visualizing the total population of each race across all counties in separate tile plots by race. We also visualized the racial demographics data in our counties of interest (Fresno, Kern, Tulare, San Joaquin and Los Angeles). After assessing the racial demographics, we continued to view the trends of our variables of interest (Fertility Rate, Gestational Age, and Birth Weight) across counties, stratified by race. We did this by visualizing each variable of interest for one specific race in a tile plot and repeated this for each race (so that each race had a tile plot of the variable over the years across all counties). We also plotted the variables, stratified by race, in all our counties of interest (Fresno, Kern, Tulare, San Joaquin and Los Angeles) to identify any trends and outcome variables that vary across different races.
We first created an interactive bar graph, allowing the users to visualize a pattern of low birth weight (<2500 g) and preterm birth (defined as <37 weeks LMP) across a span of years (2007-2016). The database is limited in that it does not show every county-level data for confidentiality reasons, suppressing values for counties that had a population of <100,000. We also created a Shiny map (using a leaflet function) to visualize a general pattern in pre-term births and low birth weight across the state of California. We initially used the WONDER CDC database and filtered it to year 2016. However, since the CDC database system excluded a great # of counties (population <100,000), the map did not come out informative as we had expected. Thus, we consulted additional data sources for preterm birth and low birth weight (California Open Data Portal; CHHS Data Open Portal). Using the leaflet map function, we were able to recognize a general pattern of prevalences in pre-term births and low birth weight across the state of California. However, we also saw that a relatively high rates in adverse perinatal outcomes occurred in the central region of the state.
We were only able to retrieve pesticide data from the California Department of Pesticide Regulation up to 2016, so the regression analysis was restricted to that year. Counties were categorized based on rank of the top ten agricultural counties, according to the California Department of Food and Agriculture Production Statistics. These counties, in no specific order, were Kern, Tulare, Fresno, Monterey, Merced, Stanislaus, San Joaquin, Imperial, Ventura, Kings county. Most of these counties are located in the San Joaquin Valley and all were highly ranked in terms of pesticide usage, according to the California Department of Pesticide Regulation.
Initially, we considered logistic regression. We were interested in seeing if the different perinatal outcomes (birthweight and gestational age at birth) could predict the odds of being in a top ten agricultural county. Unfortunately, the logistic models did not appear to model the data well, as there were counties that had very small populations. Because they had small populations, they also had few births, and it was possible that no babies were born with low-birth weight or preterm. These appeared as outliers in all the scatterplots, and we felt that excluding those counties was not appropriate. So, instead of looking at the top ten ranking as an outcome, we decided to look at it as a predictor of the different birth outcomes. Linear regression was the next option for regression analysis, since the count data could easily be transformed into rates. Poisson regression was not used because there was difficulty in determining an appropriate offset.
From the exploratory data analysis using ggplots, we learned that Tulare, Fresno, Kings, Kern, and Imperial Counties (all counties with high agricultural and pesticide use rankings) had the highest fertility rates, and had higher fertility rates than Los Angeles (our comparison county). One reason for this could be the difference in access to and attitudes regarding family planning and contraception in these rural, agricultural counties. However we observed that all counties had a decreasing fertility rate over the years which matches the global and national average trends. Before stratifying by race we were curious to see if there was a disproportionate amount of any one race in any of the counties, and we found that no specific county stood out for having a disproportionate amount of any one race, except Los Angeles county which appeared to be the country where each race had the highest total population (as expected since we know that Los Angeles is one of the most populated and most diverse counties in california. To assess exposure to racism (measured as race) as a potential confounder went on to assess all our outcomes of interest stratified by race.
We first assessed fertility rates stratified by race and found that White and Asian or Pacific Islander Populations had the highest fertility rates across all counties, compared to Black or African American and American Indian or Alaska Native with American Indian or Alaska Native populations having the lowest fertility rates across all counties.In all our counties of interest, White populations had the highest fertility rates followed by Asian or Pacific Islander Populations in Fresno County and Tulare (Ranked #1 and #3 for Pesticide Use respectively) and Black or African American Populations in Kern County and San Joaquin County (Ranked #2 and #4 for Pesticide Use respectively).
We also analyzed gestational age at birth (measured in weeks via last menstrual period method) and stratified by race. We found that in the aggregate data, most counties seemed to have relatively high average gestational age at birth, but Fresno seemed to have a slightly lower average gestational age at birth compared to other counties, and since Fresno is ranked #1 in pesticide use, this may be cause to further explore a potential relationship between pesticide exposure and low gestational age although since the data is so similar across counties, there is a high probability that this is not a statistically significant difference. When stratifying by race we found, as expected, that Black or African American populations experienced more variability and lower gestational ages across counties. In all our counties of interest, we saw that Black or African American populations had the lowest gestational age at birth compared to other races, a national trend also found in the literature. However we also observed that none of the counties of interest had average gestational ages which equaled or were more extreme than the preterm birth cutoff of 37 weeks.
We also analyzed birth weight (measured in grams) and stratified our analysis by race. We found that in the aggregate data, most counties seemed to have relatively high average birth weights and were very similar across the years, but Fresno seemed to have a slightly lower average gestational age at birth compared to other counties, and since Fresno is ranked #1 in pesticide use, this may be cause to further explore a potential relationship between pesticide exposure and low birthweight although since the data is so similar across counties, there is a high probability that this is not a statistically significant difference. When stratifying by race we found, as expected, that Black or African American populations experienced more variability and lower gestational ages across counties. In all our counties of interest, we saw that Black or African American populations had the lowest birth weights compared to other races, a national trend also found in the literature. However we also observed that none of the counties of interest had average birth weights which equaled or were more extreme than the low birthweight cutoff of 2500 grams.
In conclusion, we found that our counties of interest (i.e. counties with high pesticide use) had higher fertility rates on average than counties without high pesticide use. The relationship between exposure to racism (measured as race) and adverse perinatal health outcomes was apparent in this data as we observed lower birth weights and gestational ages for Black and African American Populations compared to other races. However there was no evidence in the initial data exploration to support a statistically significant conclusion that our counties of interest (i.e. counties with high pesticide use) had adverse perinatal health outcomes compared to counties without high pesticide use.
Using the same Shiny leaflet function, we tried to see if this region was associated with any use of pesticides. Note that California has a broad legal definition of pesticide: it includes pesticides applied in agriculture, parks, golf courses, cemeteries, etc. As can seen from the shiny app, we noticed that counties that sit around the central region (also known as San Joaquin Valley) of California have consistently been the highest pesticide users from 2007-2016. In fact, San Joaquin Valley is known to be the largest agricultural producer in the world! Naturally, our next question was whether high pesticide use in this particular region was associated with adverse perinatal outcomes. We speculated a possible correlation, and thought it would be interesting to investigate the relationship at a closer-look, using regression analysis.
We were able to create a linear regression model to determine the association between top ten ranking, average gestational age, and average birth weight by including an interaction term. The model is described in the following equation:
BW = -6880.061 + 262.434 [AGE] + 5256.662 [I(TOP)] - 136.464 [AGE]x[I(TOP)],
Where BW is the average birth weight, AGE is the average LMP gestational age, and I(TOP) is an indicator variable for being a top ten agricultural county. According to the adjusted R-squared value, the model explains 46% of the variability in the data. Overall there is a positive relationship between gestational age and birth weight, as expected, but the trend is less pronounced amongst mothers living in top agricultural counties. For two babies that are the same gestational age, the baby born in a top ranked agricultural county will have a smaller birth weight, on average. This suggests that living in a highly agricultural environment, an environment with a lot of pesticide usage, has an adverse effect on birth weight.
We were also interested in how this trend varies by race. According to the data, Asian, Pacific Islander, and Black mothers tend to have babies that weigh less at birth than American Indian, Alaska Native, or White mothers. We were able to successfully develop an appropriate linear model for all categories except for Asian and Pacific Islander mothers. The models for American Indian and Alaska Native, Black, and White mothers are: BW = -3829.713 + 184.977 [AGE], BW = -5672.432 + 230.120 [AGE] + 5451.896 [I(TOP)] - 142.404 [AGE]x[I(TOP)], and BW = -5250.834 + 221.394 [AGE] + 6590.076 [I(TOP)] - 170.141 [AGE]x[I(TOP)],
respectively.
The population of American Indian and Alaska Native people was small in every county, so they will be excluded from the following comparison.
The difference between babies born in high ranked agricultural counties and those not born in high ranked agricultural counties is smaller among those born to Black mothers as compared to those born to White mothers. However, babies born to White mothers tend to be older (in terms of gestational age), and have higher birth weights.
Imperial County appeared as an influential point in most of the linear models. Imperial did not appear to have extreme values in any variable category, and the outliers involved low birth weights for high gestational ages in Asian and Pacific Islander and Black mothers. We already determined that these populations have babies with low birth weights on average, so it was not appropriate to exclude the points from our analysis.
Load Packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ stringr 1.4.0
## ✓ tidyr 1.1.2 ✓ forcats 0.5.0
## ✓ readr 1.3.1
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(pdftools)
## Using poppler version 0.73.0
library(readr)
library(stringr)
library(ggthemes)
library(shiny)
library(shinyBS)
library(RColorBrewer)
library(shinydashboard)
##
## Attaching package: 'shinydashboard'
## The following object is masked from 'package:graphics':
##
## box
library(sp)
library(rgeos)
## rgeos version: 0.5-5, (SVN revision 640)
## GEOS runtime version: 3.8.1-CAPI-1.13.3
## Linking to sp version: 1.4-2
## Polygon checking: TRUE
library(rgdal)
## rgdal: version: 1.5-18, (SVN revision 1082)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.1.4, released 2020/10/20
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/4.0/Resources/library/rgdal/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
## Path to PROJ shared files: /Library/Frameworks/R.framework/Versions/4.0/Resources/library/rgdal/proj
## Linking to sp version:1.4-4
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
library(maptools)
## Checking rgeos availability: TRUE
library(leaflet)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(grid)
We wanted to create a map to visualize the pattern of low birth weight in California. I initially used CDC WONDER database but it had very limited information, where it grouped rural and small county-level data to “unidentified counties.” Thus, I consulted another data source from California Open Data Portal. Following is the data cleaning process:
#Data Wrangling for Year 2014-2018 Data for Map.
lbwdata<-read.csv("./low-and-very-low-birthweight-by-county-2014-2018 (1).csv", header = TRUE, stringsAsFactors = FALSE)
lbwdata <- lbwdata %>% mutate(County = str_to_title(County))
lbwdata$Events[is.na(lbwdata$Events)] <- 0
lbwdata <- lbwdata %>% group_by(Year, County, Total.Births) %>% summarize(Events = sum(Events))
## `summarise()` regrouping output by 'Year', 'County' (override with `.groups` argument)
lbwdata <- lbwdata %>% filter(!County == "california")
lbwdata <- lbwdata %>% mutate(Rate = Events/Total.Births)
We also wanted to create another map to visualize the pattern of preterm birth in California. Again, I ran into a similar problem using CDC WONDER database. Thus, I consulted the CHHS database Following is the data cleaning process:
ptbirthdata<- read.csv("preterm-and-very-preterm-births-by-county-2010-2018-3.csv", header = TRUE, stringsAsFactors = FALSE)
ptbirthdata$Events[is.na(ptbirthdata$Events)] <- 0
ptbirthdata <- ptbirthdata[,-c(7,8)]
ptbirthdata <- ptbirthdata %>% group_by(Year, County, Total.Births) %>% summarize(Events = sum(Events))
## `summarise()` regrouping output by 'Year', 'County' (override with `.groups` argument)
ptbirthdata <- ptbirthdata %>% filter(!County == "california")#removing the total count
ptbirthdata <- ptbirthdata %>% mutate(rate_pt = Events/Total.Births * 100)
After cleaning the data set, I then looked at creating a “spatial” map. Following is the wrangling process of generating a map that shapes the map of California, and merging that spatial object with low birth weight data that I wrangled earlier to generate a leaflet map. My main motivation of using a leaflet map was because I wanted to create a map where the user can see which county is which and is able to zoom in and out. Note that there are counties that had NA cases (perhaps for counties that had a very small population).
map <- readOGR(path.expand("cb_2018_us_county_20m.shp"),
layer = "cb_2018_us_county_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/lararostomian/Desktop/Harvard/Classes/BST 260/datascience-project/Data Prep (& Final RMD)/cb_2018_us_county_20m.shp", layer: "cb_2018_us_county_20m"
## with 3220 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
Statekey<-read.csv('./STATEFPtoSTATENAME_Key.csv', colClasses=c('character'))
map<-merge(x=map, y=Statekey, by="STATEFP", all=TRUE)
SingleState <- subset(map, map$STATENAME %in% c(
"California"
))
lbwdata_2016 <- lbwdata %>% filter(Year == "2016") %>% mutate(Rate = Events/Total.Births*100)
spatial_lbw <-sp::merge(x=SingleState, y=lbwdata_2016, by.x="NAME", by.y="County", by=x)
bins <- c(4.0,6.3,7.6,8.1, Inf)
pal <- colorBin(
palette = "viridis",
domain = spatial_lbw$Rate, n=7, bins=bins)
leaflet(spatial_lbw, options = leafletOptions(zoomControl = TRUE, zoomLevelFixed = FALSE, dragging=TRUE, minZoom = 5.3, maxZoom = 9)) %>%
setView(lat = 36.778259, lng = -119.417931, zoom = 6) %>%
addPolygons(color = "Black", weight = 1, smoothFactor = 0.5,
opacity = 1.0, fillOpacity = 0.5, layerId = ~NAME,
fillColor = ~pal(Rate),
popup = ~as.factor(paste0("<b><font size=\"4\"><center>County: </b>",spatial_lbw$NAME,"</font></center>","<b>% of Low Birth Weight Births: </b>", sprintf("%1.2f%%", spatial_lbw$Rate),"<br/>"))) %>%
addLegend(pal = pal, values = spatial_lbw$Rate, opacity = 1, title="% Low Birth Weight (Quartiles)")
## Warning in pal(Rate): Some values were outside the color scale and will be
## treated as NA
This is a similar spatial map but for preterm birth. Following is the wrangling process of generating a map that shapes the map of California, and merging that spatial object with pre-term birth data that I wrangled earlier to generate a leaflet map. Note that there are counties that had NA cases (perhaps for counties that had a very small population).
map <- readOGR(path.expand("cb_2018_us_county_20m.shp"),
layer = "cb_2018_us_county_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/lararostomian/Desktop/Harvard/Classes/BST 260/datascience-project/Data Prep (& Final RMD)/cb_2018_us_county_20m.shp", layer: "cb_2018_us_county_20m"
## with 3220 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
Statekey<-read.csv('./STATEFPtoSTATENAME_Key.csv', colClasses=c('character'))
map<-merge(x=map, y=Statekey, by="STATEFP", all=TRUE)
SingleState <- subset(map, map$STATENAME %in% c(
"California"
))
ptbirthdata_2016 <- ptbirthdata %>% filter(Year == "2016")
spatial_pt <-sp::merge(x=SingleState, y=ptbirthdata_2016, by.x="NAME", by.y="County", by=x)
bin <- c(5.5, 8.2, 9.1, 9.9, Inf)
pal2 <- colorBin(
palette = "plasma",
domain = spatial_pt$rate_pt, n=7, bins=bin)
leaflet(spatial_pt, options = leafletOptions(zoomControl = TRUE, zoomLevelFixed = FALSE, dragging=TRUE, minZoom = 5.3, maxZoom = 9)) %>%
setView(lat = 36.778259, lng = -119.417931, zoom = 6) %>%
addPolygons(color = "Black", weight = 1, smoothFactor = 0.5,
opacity = 1.0, fillOpacity = 0.5, layerId = ~NAME,
fillColor = ~pal2(rate_pt),
popup = ~as.factor(paste0("<b><font size=\"4\"><center>County: </b>",spatial_pt$NAME,"</font></center>","<b>% of Preterm Birth: </b>", sprintf("%1.2f%%", spatial_pt$rate_pt),"<br/>"))) %>% addLegend(pal = pal2, values = spatial_pt$rate_pt, opacity = 1, title="% Preterm Birth (Quartiles)")
## Warning in pal2(rate_pt): Some values were outside the color scale and will be
## treated as NA
The data sets used for the exploratory analysis of MCH indicators by county in California were downloaded from the CDC Wonder Database in “.txt” format. I read the .txt files into the rmd file and turned them into data frames. The first data frame, MCH.CDC.Data had Maternal and Infant Health Outcomes by county over the years, the MCH.CDC.Data_Race had the same variables as the MCH.CDC.Data frame but was stratified by Mother’s Race. I also renamed all the counties in these two data frames to match the same names (re: case and format) as the counties in the pesticide data frames for easier comparison of the variables in these two data frames when comparing by county. Below is the data wrangling and cleaning code for the Maternal and Child Health Data from the CDC Wonder Source.
#Data Wrangling for CDC Data (COMPLETE)
MCH.CDC.Data <- read.delim("NatalityTOTAL.txt", sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data <- MCH.CDC.Data[-c(491:585), ]
MCH.CDC.Data <- MCH.CDC.Data %>% filter(Notes != "Total")
MCH.CDC.Data <- MCH.CDC.Data[ ,-c(1,3,5,7,9)]
MCH.CDC.Data_Race <- read.delim("NatalityRACE.txt", sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data_Race <- MCH.CDC.Data_Race[-c(1794:1931), ]
MCH.CDC.Data_Race <- MCH.CDC.Data_Race[ ,-c(1,3,5,7,9,11)]
#Rename Counties to Match Pesticide Data
MCH.CDC.Data[MCH.CDC.Data$County == "Alameda County, CA", "County"] <-"Alameda"
MCH.CDC.Data[MCH.CDC.Data$County == "Butte County, CA", "County"] <-"Butte"
MCH.CDC.Data[MCH.CDC.Data$County == "Contra Costa County, CA", "County"] <-"Contra Costa"
MCH.CDC.Data[MCH.CDC.Data$County == "El Dorado County, CA", "County"] <-"El Dorado"
MCH.CDC.Data[MCH.CDC.Data$County == "Fresno County, CA", "County"] <-"Fresno"
MCH.CDC.Data[MCH.CDC.Data$County == "Humboldt County, CA", "County"] <-"Humboldt"
MCH.CDC.Data[MCH.CDC.Data$County == "Imperial County, CA", "County"] <-"Imperial"
MCH.CDC.Data[MCH.CDC.Data$County == "Kern County, CA", "County"] <-"Kern"
MCH.CDC.Data[MCH.CDC.Data$County == "Kings County, CA", "County"] <-"Kings"
MCH.CDC.Data[MCH.CDC.Data$County == "Los Angeles County, CA", "County"] <-"Los Angeles"
MCH.CDC.Data[MCH.CDC.Data$County == "Madera County, CA", "County"] <-"Madera"
MCH.CDC.Data[MCH.CDC.Data$County == "Marin County, CA", "County"] <-"Marin"
MCH.CDC.Data[MCH.CDC.Data$County == "Contra Costa County, CA", "County"] <-"Mariposa"
MCH.CDC.Data[MCH.CDC.Data$County == "Merced County, CA", "County"] <-"Merced"
MCH.CDC.Data[MCH.CDC.Data$County == "Monterey County, CA", "County"] <-"Monterey"
MCH.CDC.Data[MCH.CDC.Data$County == "Napa County, CA", "County"] <-"Napa"
MCH.CDC.Data[MCH.CDC.Data$County == "Orange County, CA", "County"] <-"Orange"
MCH.CDC.Data[MCH.CDC.Data$County == "Placer County, CA", "County"] <-"Placer"
MCH.CDC.Data[MCH.CDC.Data$County == "Riverside County, CA", "County"] <-"Riverside"
MCH.CDC.Data[MCH.CDC.Data$County == "Sacramento County, CA", "County"] <-"Sacramento"
MCH.CDC.Data[MCH.CDC.Data$County == "San Bernardino County, CA", "County"] <-"San Bernardino"
MCH.CDC.Data[MCH.CDC.Data$County == "San Diego County, CA", "County"] <-"San Diego"
MCH.CDC.Data[MCH.CDC.Data$County == "San Francisco County, CA", "County"] <-"San Francisco"
MCH.CDC.Data[MCH.CDC.Data$County == "San Joaquin County, CA", "County"] <-"San Joaquin"
MCH.CDC.Data[MCH.CDC.Data$County == "San Luis Obispo County, CA", "County"] <-"San Luis Obispo"
MCH.CDC.Data[MCH.CDC.Data$County == "San Mateo County, CA", "County"] <-"San Mateo"
MCH.CDC.Data[MCH.CDC.Data$County == "Santa Barbara County, CA", "County"] <-"Santa Barbara"
MCH.CDC.Data[MCH.CDC.Data$County == "Santa Clara County, CA", "County"] <-"Canta Clara"
MCH.CDC.Data[MCH.CDC.Data$County == "Santa Cruz County, CA", "County"] <-"Santa Cruz"
MCH.CDC.Data[MCH.CDC.Data$County == "Shasta County, CA", "County"] <-"Shasta"
MCH.CDC.Data[MCH.CDC.Data$County == "Solano County, CA", "County"] <-"Solano"
MCH.CDC.Data[MCH.CDC.Data$County == "Sonoma County, CA", "County"] <-"Sonoma"
MCH.CDC.Data[MCH.CDC.Data$County == "Stanislaus County, CA", "County"] <-"Stanislaus"
MCH.CDC.Data[MCH.CDC.Data$County == "Tulare County, CA", "County"] <-"Tulare"
MCH.CDC.Data[MCH.CDC.Data$County == "Ventura County, CA", "County"] <-"Ventura"
MCH.CDC.Data[MCH.CDC.Data$County == "Yolo County, CA", "County"] <-"Yolo"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Alameda County, CA", "County"] <-"Alameda"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Butte County, CA", "County"] <-"Butte"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Contra Costa County, CA", "County"] <-"Contra Costa"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "El Dorado County, CA", "County"] <-"El Dorado"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Fresno County, CA", "County"] <-"Fresno"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Humboldt County, CA", "County"] <-"Humboldt"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Imperial County, CA", "County"] <-"Imperial"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Kern County, CA", "County"] <-"Kern"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Kings County, CA", "County"] <-"Kings"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Los Angeles County, CA", "County"] <-"Los Angeles"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Madera County, CA", "County"] <-"Madera"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Marin County, CA", "County"] <-"Marin"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Contra Costa County, CA", "County"] <-"Mariposa"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Merced County, CA", "County"] <-"Merced"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Monterey County, CA", "County"] <-"Monterey"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Napa County, CA", "County"] <-"Napa"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Orange County, CA", "County"] <-"Orange"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Placer County, CA", "County"] <-"Placer"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Riverside County, CA", "County"] <-"Riverside"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Sacramento County, CA", "County"] <-"Sacramento"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Bernardino County, CA", "County"] <-"San Bernardino"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Diego County, CA", "County"] <-"San Diego"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Francisco County, CA", "County"] <-"San Francisco"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Joaquin County, CA", "County"] <-"San Joaquin"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Luis Obispo County, CA", "County"] <-"San Luis Obispo"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Mateo County, CA", "County"] <-"San Mateo"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Santa Barbara County, CA", "County"] <-"Santa Barbara"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Santa Clara County, CA", "County"] <-"Santa Clara"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Santa Cruz County, CA", "County"] <-"Santa Cruz"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Shasta County, CA", "County"] <-"Shasta"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Solano County, CA", "County"] <-"Solano"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Sonoma County, CA", "County"] <-"Sonoma"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Stanislaus County, CA", "County"] <-"Stanislaus"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Tulare County, CA", "County"] <-"Tulare"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Ventura County, CA", "County"] <-"Ventura"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Yolo County, CA", "County"] <-"Yolo"
MCH.CDC.Data_Race<- MCH.CDC.Data_Race %>% rename("Mothers.Race" = "Mother.s.Bridged.Race")
Data Key:
The first variable I examined was fertility rate and I first visualized the fertility rates across all counties over the years in a tile plot. I then viewed the trend in our counties of interest which include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since LA is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use.
#Fertility Rate GGPLOTS
MCH.CDC.Data %>%
ggplot(aes(x = Year, y = County, fill = Fertility.Rate)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Fertility Rate", limits = c(30,100),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Fertility Rate by County") +
ylab("") + xlab("")
#Fresno, Ranked #1 in Pesticide Use
MCH.CDC.Data %>% group_by(County) %>% filter(County == "Fresno") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Fresno County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") + ylim(30,100)
#Kern, Ranked #2 in Pesticide Use
MCH.CDC.Data %>% group_by(County) %>% filter(County == "Kern") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Kern County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") + ylim(30,100)
#Tulare, Ranked #3 in Pesticide Use
MCH.CDC.Data %>% group_by(County) %>% filter(County == "Tulare") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Tulare County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") + ylim(30,100)
#San Joaquin, Ranked #4 in Pesticide Use
MCH.CDC.Data %>% group_by(County) %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in San Joaquin County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") + ylim(30,100)
#Los Angeles, Comparison Group
MCH.CDC.Data %>% group_by(County) %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Los Angeles County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") + ylim(30,100)
#Grid Plots
p1 <- MCH.CDC.Data %>% group_by(County) %>% filter(County == "Fresno") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Fresno County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none") + ylim(30,100)
p2 <- MCH.CDC.Data %>% group_by(County) %>% filter(County == "Kern") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Kern County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none") + ylim(30,100)
p3 <-MCH.CDC.Data %>% group_by(County) %>% filter(County == "Tulare") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Tulare County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none") + ylim(30,100)
p4 <-MCH.CDC.Data %>% group_by(County) %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Los Angeles County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none") + ylim(30,100)
grid.arrange(p1, p2, p3, p4, bottom = "Data Source: CDC WONDER Online Database")
The next variable I examined was the racial demographics and I did so by stratifying the MCH.CDC.Data by race (via the MCH.CDC.Data_Race data frame) and viewing the total population of each race across all counties in the tile plots to assess if there was any one county with a more dense population of a certain race. I then viewed the racial demographic trends in our counties of interest which include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since LA is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use.
#Racial Demographic GGPLOTS
MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(x = Year, y = County, fill = Total.Population)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("American Indian or Alaska Native Population",
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Racial Demographics by County") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(x = Year, y = County, fill = Total.Population)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Asian or Pacific Islander",
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Racial Demographics by County") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(x = Year, y = County, fill = Total.Population)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Black or African American Population",
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Racial Demographics by County") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
ggplot(aes(x = Year, y = County, fill = Total.Population)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("White Population",
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Racial Demographics by County") +
ylab("") + xlab("")
#Fresno, Ranked #1 in Pesticide Use
MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Fresno") %>% ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Fresno County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + scale_y_log10()
#Kern, Ranked #2 in Pesticide Use
MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Kern") %>% ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Kern County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + scale_y_log10()
#Tulare, Ranked #3 in Pesticide Use
MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Tulare") %>% ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Tulare County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + scale_y_log10()
#San Joaquin, Ranked #4 in Pesticide Use
MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in San Joaquin County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + scale_y_log10()
#Los Angeles, Comparison Group
MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Los Angeles County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + scale_y_log10()
##### Fertility Rate (Stratified By Race) The next variable I examined was once again fertility rate but this time stratified by race. I first visualized the fertility rates across all counties over the years by each race in a tile plot. I then viewed the trends of fertility rates by race in our counties of interest which include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since LA is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use.
#Race and Fertility GGPLOTS
MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(x = Year, y = County, fill = Fertility.Rate)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Fertility Rate", limits = c(0,115),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Fertility Rate by County for American Indian or Alaska Native Pop.") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(x = Year, y = County, fill = Fertility.Rate)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Fertility Rate",limits = c(0,115),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Fertility Rate by County for Asian or Pacific Islander Pop.") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(x = Year, y = County, fill = Fertility.Rate)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Fertility Rate", limits = c(0,115),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Fertility Rate by County for Black or African American Pop.") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
ggplot(aes(x = Year, y = County, fill = Fertility.Rate)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Fertility Rate", limits = c(0,115),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Fertility Rate by County for White Pop.") +
ylab("") + xlab("")
#Fresno, Ranked #1 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Fresno") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Fresno County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + ylim(0,115)
#Kern, Ranked #2 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Kern") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Kern County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + ylim(0,115)
#Tulare, Ranked #3 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Tulare") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Tulare County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + ylim(0,115)
#San Joaquin, Ranked #4 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in San Joaquin County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + ylim(0,115)
#Los Angeles, Comparison
MCH.CDC.Data_Race %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Los Angeles County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + ylim(0,115)
The next variable I examined was preterm birth measured as LMP (last menstrual period) gestation age in weeks, and I added stratification by race. I first visualized the preterm birth across all counties over the years and then stratified the tile plots by race. I then viewed the trends of gestational age by race in our counties of interest which include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since LA is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use. I also included a horizontal line for the county level stratified data at 37 weeks which is the cutoff for defining preterm birth.
#Preterm Birth GGPLOTS (With Race)
MCH.CDC.Data %>%
ggplot(aes(x = Year, y = County, fill = Average.LMP.Gestational.Age)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", limits = c(36,40),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Average Gestational Age by County") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(x = Year, y = County, fill = Average.LMP.Gestational.Age)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", limits = c(36,40),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Average Gestational Age by County for American Indian or Alaska Native Pop.") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(x = Year, y = County, fill = Average.LMP.Gestational.Age)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", limits = c(36,40),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Average Gestational Age by County for Asian or Pacific Islander Pop.") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(x = Year, y = County, fill = Average.LMP.Gestational.Age)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", limits = c(36,40),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Average Gestational Age by County for Black or African American Pop.") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
ggplot(aes(x = Year, y = County, fill = Average.LMP.Gestational.Age)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", limits = c(36,40),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Average Gestational Age by County for White Pop.") +
ylab("") + xlab("")
#Fresno, Ranked #1 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Fresno") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Fresno County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 37, size =2) + ylim(36,40)
#Kern, Ranked #2 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Kern") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Kern County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 37, size =2) + ylim(36,40)
#Tulare, Ranked #3 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Tulare") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Tulare", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 37, size =2) + ylim(36,40)
#San Joaquin, Ranked #4 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in San Joaquin County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 37, size =2) + ylim(36,40)
#Los Angeles County, Comparison Group
MCH.CDC.Data_Race %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Los Angeles County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 37, size =2) + ylim(36,40)
#Preterm Birth Cutoff is 37 Weeks (Horizontal Line)
The next variable I examined was birth weight measured in grams, and I added stratification by race. I first visualized the birth weight across all counties over the years and then stratified the tile plots by race. I then viewed the trends of birth weight by race in our counties of interest which include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since LA is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use. I also included a horizontal line for the county level stratified data at 25000 grams which is the cutoff for defining low birth weight.
#Birth weight GGPLOTS (With Race)
MCH.CDC.Data %>%
ggplot(aes(x = Year, y = County, fill = Average.Birth.Weight)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Average Birth Weight in Grams", limits = c(2400, 3600),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Average Birth Weight by County") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(x = Year, y = County, fill = Average.Birth.Weight)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Average Birth Weight in Grams", limits = c(2400, 3600),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Average Birth Weight by County for American Indian or Alaska Native Pop.") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(x = Year, y = County, fill = Average.Birth.Weight)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Average Birth Weight in Grams", limits = c(2400, 3600),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Average Birth Weight by County for Asian or Pacific Islander Pop.") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(x = Year, y = County, fill = Average.Birth.Weight)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Average Birth Weight Grams", limits = c(2400, 3600),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Average Birth Weight by County for Black or African American Pop.") +
ylab("") + xlab("")
MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
ggplot(aes(x = Year, y = County, fill = Average.Birth.Weight)) +
geom_tile(color = "grey50") +
scale_x_continuous(expand = c(0,0)) +
scale_fill_gradientn("Average Birth Weight in Grams", limits = c(2400, 3600),
colors = brewer.pal(9, "Reds")) +
theme_minimal() +
theme(panel.grid = element_blank()) +
ggtitle("Average Birth Weight by County for White Pop.") +
ylab("") + xlab("")
#Fresno, Ranked #1 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Fresno") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Fresno County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 2500, size =2)+ ylim(2400, 3600)
#Kern, Ranked #2 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Kern") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Kern County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 2500, size =2)+ ylim(2400, 3600)
#Tulare, Ranked #3 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Tulare") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Tulare", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 2500, size =2)+ ylim(2400, 3600)
#San Joaquin, Ranked #4 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in San Joaquin County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 2500, size =2)+ ylim(2400, 3600)
#Los Angeles, Comparison Group
MCH.CDC.Data_Race %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Los Angeles County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 2500, size =2) + ylim(2400, 3600)
#LBW Cutoff is 2500 Grams (Horizontal Line)
####Pesticide Data Wrangling
The
county_ranks16 <- read_delim("table1_county_rank_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## Parsed with column specification:
## cols(
## COUNTY = col_character(),
## LBS_2015 = col_double(),
## RANK_2015 = col_double(),
## LBS_2016 = col_double(),
## RANK_2016 = col_double()
## )
repro_lbs16 <- read_delim("table3_reproductive_lbs_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## Parsed with column specification:
## cols(
## CHEMICAL = col_character(),
## LBS_2007 = col_double(),
## LBS_2008 = col_double(),
## LBS_2009 = col_double(),
## LBS_2010 = col_double(),
## LBS_2011 = col_double(),
## LBS_2012 = col_double(),
## LBS_2013 = col_double(),
## LBS_2014 = col_double(),
## LBS_2015 = col_double(),
## LBS_2016 = col_double()
## )
repro_acre16 <- read_delim("table4_reproductive_acres_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## Parsed with column specification:
## cols(
## CHEMNAME = col_character(),
## ACRES_2007 = col_double(),
## ACRES_2008 = col_double(),
## ACRES_2009 = col_double(),
## ACRES_2010 = col_double(),
## ACRES_2011 = col_double(),
## ACRES_2012 = col_double(),
## ACRES_2013 = col_double(),
## ACRES_2014 = col_double(),
## ACRES_2015 = col_double(),
## ACRES_2016 = col_double()
## )
table1_2016 <- county_ranks16 %>% transmute(county = COUNTY,
lbs_2015 = LBS_2015, rank_2015 = RANK_2015,
lbs_2016 = LBS_2016, rank_2016 = RANK_2016)
# column 1 is the county
# columns 2-3 have the previous year data
# columns 4-5 have the current year data
# we only want columns 1-3 for the most up-to-date data for all years before 2016
all_dat <- list(read_csv("table1_2007.csv")[1:3],
read_csv("table1_2008.csv")[1:3],
read_csv("table1_2009.csv")[1:3],
read_csv("table1_2010.csv")[1:3],
read_csv("table1_2011.csv")[1:3],
read_csv("table1_2012.csv")[1:3],
read_csv("table1_2013.csv")[1:3],
read_csv("table1_2014.csv")[1:3],
read_csv("table1_2015.csv")[1:3])
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2006 = col_double(),
## rank_2006 = col_double(),
## lbs_2007 = col_double(),
## rank_2007 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2007 = col_double(),
## rank_2007 = col_double(),
## lbs_2008 = col_double(),
## rank_2008 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2008 = col_double(),
## rank_2008 = col_double(),
## lbs_2009 = col_double(),
## rank_2009 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2009 = col_double(),
## rank_2009 = col_double(),
## lbs_2010 = col_double(),
## rank_2010 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2010 = col_double(),
## rank_2010 = col_double(),
## lbs_2011 = col_double(),
## rank_2011 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2011 = col_double(),
## rank_2011 = col_double(),
## lbs_2012 = col_double(),
## rank_2012 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2012 = col_double(),
## rank_2012 = col_double(),
## lbs_2013 = col_double(),
## rank_2013 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2013 = col_double(),
## rank_2013 = col_double(),
## lbs_2014 = col_double(),
## rank_2014 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2014 = col_double(),
## rank_2014 = col_double(),
## lbs_2015 = col_double(),
## rank_2015 = col_double()
## )
table1 <- Reduce(function(x, y) left_join(x, y, by = "county"), all_dat)
long_table1 <- table1 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
#table1_ranks <- long_table1 %>% filter(str_starts(usage, "rank"))
table1_lbs <- long_table1 %>% filter(str_starts(usage, "lbs"))
long_table2 <- table1_2016 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
#table1_ranks_1516 <- long_table2 %>% filter(str_starts(usage, "rank"))
table1_lbs_1516<- long_table2 %>% filter(str_starts(usage, "lbs"))
table1_lbs$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs$usage))
table1_lbs_1516$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs_1516$usage))
combined_pesticide_use <- table1_lbs %>% full_join(table1_lbs_1516)
## Joining, by = c("county", "usage", "value")
class(combined_pesticide_use$usage)
## [1] "numeric"
combined_pesticide_use <- combined_pesticide_use %>% group_by(usage)
combined_pesticide_use <- combined_pesticide_use %>% arrange(usage)
This is my data wrangling process for low birth weight for the CDC WONDER database. By default, CDC WONDER live birth database only displayed counties that had a county population >100,000. I only looked at low birth rate here and this is for my shiny app bar graph.
#MCH_CDC Data for low birth weight
#data wrangling mch cdc data
cdc_lowbirthweight <- read.delim("MCH CDC Data.txt", sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
cdc_lowbirthweight <- cdc_lowbirthweight [-c(482:538), ]
cdc_lowbirthweight <- cdc_lowbirthweight [ ,-c(1, 3, 5, 7)]
MCH.CDC.Data.Total <- read.delim("MCH CDC Data Total.txt", sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data.Total <- MCH.CDC.Data.Total[,-c(1, 3, 5)]
MCH.CDC.Data.Total %>% rename("Total Birth" = "Births")
## Year County Total Birth
## 1 2007 Alameda County, CA 21522
## 2 2007 Butte County, CA 2523
## 3 2007 Contra Costa County, CA 13487
## 4 2007 El Dorado County, CA 1882
## 5 2007 Fresno County, CA 17292
## 6 2007 Humboldt County, CA 1599
## 7 2007 Imperial County, CA 3146
## 8 2007 Kern County, CA 15336
## 9 2007 Kings County, CA 2781
## 10 2007 Los Angeles County, CA 151908
## 11 2007 Madera County, CA 2612
## 12 2007 Marin County, CA 2820
## 13 2007 Merced County, CA 4652
## 14 2007 Monterey County, CA 7551
## 15 2007 Napa County, CA 1665
## 16 2007 Orange County, CA 44038
## 17 2007 Placer County, CA 4054
## 18 2007 Riverside County, CA 34563
## 19 2007 Sacramento County, CA 22119
## 20 2007 San Bernardino County, CA 35190
## 21 2007 San Diego County, CA 47569
## 22 2007 San Francisco County, CA 9129
## 23 2007 San Joaquin County, CA 11600
## 24 2007 San Luis Obispo County, CA 2884
## 25 2007 San Mateo County, CA 9914
## 26 2007 Santa Barbara County, CA 6292
## 27 2007 Santa Clara County, CA 27490
## 28 2007 Santa Cruz County, CA 3571
## 29 2007 Shasta County, CA 2230
## 30 2007 Solano County, CA 5849
## 31 2007 Sonoma County, CA 5742
## 32 2007 Stanislaus County, CA 8827
## 33 2007 Tulare County, CA 8507
## 34 2007 Ventura County, CA 12198
## 35 2007 Yolo County, CA 2522
## 36 2007 Unidentified Counties, CA 11350
## 37 2007 566414
## 38 2008 Alameda County, CA 20976
## 39 2008 Butte County, CA 2520
## 40 2008 Contra Costa County, CA 13135
## 41 2008 El Dorado County, CA 1814
## 42 2008 Fresno County, CA 16764
## 43 2008 Humboldt County, CA 1601
## 44 2008 Imperial County, CA 3241
## 45 2008 Kern County, CA 15316
## 46 2008 Kings County, CA 2711
## 47 2008 Los Angeles County, CA 147745
## 48 2008 Madera County, CA 2535
## 49 2008 Marin County, CA 2719
## 50 2008 Merced County, CA 4422
## 51 2008 Monterey County, CA 7435
## 52 2008 Napa County, CA 1671
## 53 2008 Orange County, CA 42467
## 54 2008 Placer County, CA 4035
## 55 2008 Riverside County, CA 32881
## 56 2008 Sacramento County, CA 21397
## 57 2008 San Bernardino County, CA 33837
## 58 2008 San Diego County, CA 46755
## 59 2008 San Francisco County, CA 9106
## 60 2008 San Joaquin County, CA 11030
## 61 2008 San Luis Obispo County, CA 2739
## 62 2008 San Mateo County, CA 9770
## 63 2008 Santa Barbara County, CA 6320
## 64 2008 Santa Clara County, CA 26731
## 65 2008 Santa Cruz County, CA 3537
## 66 2008 Shasta County, CA 2186
## 67 2008 Solano County, CA 5609
## 68 2008 Sonoma County, CA 5763
## 69 2008 Stanislaus County, CA 8550
## 70 2008 Tulare County, CA 8535
## 71 2008 Ventura County, CA 12075
## 72 2008 Yolo County, CA 2669
## 73 2008 Unidentified Counties, CA 11182
## 74 2008 551779
## 75 2009 Alameda County, CA 20325
## 76 2009 Butte County, CA 2440
## 77 2009 Contra Costa County, CA 12686
## 78 2009 El Dorado County, CA 1727
## 79 2009 Fresno County, CA 16271
## 80 2009 Humboldt County, CA 1541
## 81 2009 Imperial County, CA 3151
## 82 2009 Kern County, CA 14828
## 83 2009 Kings County, CA 2645
## 84 2009 Los Angeles County, CA 139757
## 85 2009 Madera County, CA 2390
## 86 2009 Marin County, CA 2496
## 87 2009 Merced County, CA 4407
## 88 2009 Monterey County, CA 7070
## 89 2009 Napa County, CA 1653
## 90 2009 Orange County, CA 40437
## 91 2009 Placer County, CA 3810
## 92 2009 Riverside County, CA 31605
## 93 2009 Sacramento County, CA 20433
## 94 2009 San Bernardino County, CA 32006
## 95 2009 San Diego County, CA 44982
## 96 2009 San Francisco County, CA 8810
## 97 2009 San Joaquin County, CA 10876
## 98 2009 San Luis Obispo County, CA 2617
## 99 2009 San Mateo County, CA 9452
## 100 2009 Santa Barbara County, CA 6041
## 101 2009 Santa Clara County, CA 25203
## 102 2009 Santa Cruz County, CA 3299
## 103 2009 Shasta County, CA 2068
## 104 2009 Solano County, CA 5393
## 105 2009 Sonoma County, CA 5685
## 106 2009 Stanislaus County, CA 7942
## 107 2009 Tulare County, CA 8361
## 108 2009 Ventura County, CA 11360
## 109 2009 Yolo County, CA 2483
## 110 2009 Unidentified Counties, CA 10770
## 111 2009 527020
## 112 2010 Alameda County, CA 19306
## 113 2010 Butte County, CA 2457
## 114 2010 Contra Costa County, CA 12358
## 115 2010 El Dorado County, CA 1621
## 116 2010 Fresno County, CA 16283
## 117 2010 Humboldt County, CA 1551
## 118 2010 Imperial County, CA 3081
## 119 2010 Kern County, CA 14419
## 120 2010 Kings County, CA 2509
## 121 2010 Los Angeles County, CA 133252
## 122 2010 Madera County, CA 2434
## 123 2010 Marin County, CA 2371
## 124 2010 Merced County, CA 4249
## 125 2010 Monterey County, CA 6765
## 126 2010 Napa County, CA 1525
## 127 2010 Orange County, CA 38250
## 128 2010 Placer County, CA 3825
## 129 2010 Riverside County, CA 30670
## 130 2010 Sacramento County, CA 20056
## 131 2010 San Bernardino County, CA 31368
## 132 2010 San Diego County, CA 44867
## 133 2010 San Francisco County, CA 8806
## 134 2010 San Joaquin County, CA 10596
## 135 2010 San Luis Obispo County, CA 2735
## 136 2010 San Mateo County, CA 9194
## 137 2010 Santa Barbara County, CA 5821
## 138 2010 Santa Clara County, CA 23940
## 139 2010 Santa Cruz County, CA 3192
## 140 2010 Shasta County, CA 2136
## 141 2010 Solano County, CA 5050
## 142 2010 Sonoma County, CA 5393
## 143 2010 Stanislaus County, CA 7806
## 144 2010 Tulare County, CA 8155
## 145 2010 Ventura County, CA 11150
## 146 2010 Yolo County, CA 2427
## 147 2010 Unidentified Counties, CA 10580
## 148 2010 510198
## 149 2011 Alameda County, CA 19003
## 150 2011 Butte County, CA 2391
## 151 2011 Contra Costa County, CA 12060
## 152 2011 El Dorado County, CA 1630
## 153 2011 Fresno County, CA 16160
## 154 2011 Humboldt County, CA 1448
## 155 2011 Imperial County, CA 3079
## 156 2011 Kern County, CA 14287
## 157 2011 Kings County, CA 2567
## 158 2011 Los Angeles County, CA 130370
## 159 2011 Madera County, CA 2401
## 160 2011 Marin County, CA 2386
## 161 2011 Merced County, CA 4280
## 162 2011 Monterey County, CA 6812
## 163 2011 Napa County, CA 1572
## 164 2011 Orange County, CA 38101
## 165 2011 Placer County, CA 3834
## 166 2011 Riverside County, CA 30611
## 167 2011 Sacramento County, CA 20002
## 168 2011 San Bernardino County, CA 30566
## 169 2011 San Diego County, CA 43643
## 170 2011 San Francisco County, CA 8813
## 171 2011 San Joaquin County, CA 10329
## 172 2011 San Luis Obispo County, CA 2631
## 173 2011 San Mateo County, CA 9048
## 174 2011 Santa Barbara County, CA 5804
## 175 2011 Santa Clara County, CA 23649
## 176 2011 Santa Cruz County, CA 3233
## 177 2011 Shasta County, CA 2022
## 178 2011 Solano County, CA 5160
## 179 2011 Sonoma County, CA 5150
## 180 2011 Stanislaus County, CA 7738
## 181 2011 Tulare County, CA 7966
## 182 2011 Ventura County, CA 10656
## 183 2011 Yolo County, CA 2341
## 184 2011 Unidentified Counties, CA 10377
## 185 2011 502120
## 186 2012 Alameda County, CA 19546
## 187 2012 Butte County, CA 2399
## 188 2012 Contra Costa County, CA 12065
## 189 2012 El Dorado County, CA 1513
## 190 2012 Fresno County, CA 15955
## 191 2012 Humboldt County, CA 1504
## 192 2012 Imperial County, CA 3054
## 193 2012 Kern County, CA 14553
## 194 2012 Kings County, CA 2358
## 195 2012 Los Angeles County, CA 131664
## 196 2012 Madera County, CA 2257
## 197 2012 Marin County, CA 2305
## 198 2012 Merced County, CA 4312
## 199 2012 Monterey County, CA 6652
## 200 2012 Napa County, CA 1431
## 201 2012 Orange County, CA 38183
## 202 2012 Placer County, CA 3648
## 203 2012 Riverside County, CA 30300
## 204 2012 Sacramento County, CA 19623
## 205 2012 San Bernardino County, CA 30701
## 206 2012 San Diego County, CA 44396
## 207 2012 San Francisco County, CA 9075
## 208 2012 San Joaquin County, CA 10129
## 209 2012 San Luis Obispo County, CA 2580
## 210 2012 San Mateo County, CA 9185
## 211 2012 Santa Barbara County, CA 5585
## 212 2012 Santa Clara County, CA 24308
## 213 2012 Santa Cruz County, CA 3083
## 214 2012 Shasta County, CA 2109
## 215 2012 Solano County, CA 5062
## 216 2012 Sonoma County, CA 5143
## 217 2012 Stanislaus County, CA 7591
## 218 2012 Tulare County, CA 8000
## 219 2012 Ventura County, CA 10641
## 220 2012 Yolo County, CA 2451
## 221 2012 Unidentified Counties, CA 10394
## 222 2012 503755
## 223 2013 Alameda County, CA 19257
## 224 2013 Butte County, CA 2415
## 225 2013 Contra Costa County, CA 12154
## 226 2013 El Dorado County, CA 1534
## 227 2013 Fresno County, CA 15737
## 228 2013 Humboldt County, CA 1531
## 229 2013 Imperial County, CA 3094
## 230 2013 Kern County, CA 14149
## 231 2013 Kings County, CA 2394
## 232 2013 Los Angeles County, CA 128598
## 233 2013 Madera County, CA 2315
## 234 2013 Marin County, CA 2321
## 235 2013 Merced County, CA 4162
## 236 2013 Monterey County, CA 6547
## 237 2013 Napa County, CA 1450
## 238 2013 Orange County, CA 37281
## 239 2013 Placer County, CA 3688
## 240 2013 Riverside County, CA 29941
## 241 2013 Sacramento County, CA 19371
## 242 2013 San Bernardino County, CA 30246
## 243 2013 San Diego County, CA 43659
## 244 2013 San Francisco County, CA 8814
## 245 2013 San Joaquin County, CA 9800
## 246 2013 San Luis Obispo County, CA 2650
## 247 2013 San Mateo County, CA 8824
## 248 2013 Santa Barbara County, CA 5755
## 249 2013 Santa Clara County, CA 23313
## 250 2013 Santa Cruz County, CA 2871
## 251 2013 Shasta County, CA 2143
## 252 2013 Solano County, CA 5259
## 253 2013 Sonoma County, CA 4983
## 254 2013 Stanislaus County, CA 7579
## 255 2013 Tulare County, CA 7653
## 256 2013 Ventura County, CA 10446
## 257 2013 Yolo County, CA 2491
## 258 2013 Unidentified Counties, CA 10280
## 259 2013 494705
## 260 2014 Alameda County, CA 19650
## 261 2014 Butte County, CA 2481
## 262 2014 Contra Costa County, CA 12557
## 263 2014 El Dorado County, CA 1618
## 264 2014 Fresno County, CA 15762
## 265 2014 Humboldt County, CA 1468
## 266 2014 Imperial County, CA 3226
## 267 2014 Kern County, CA 14193
## 268 2014 Kings County, CA 2350
## 269 2014 Los Angeles County, CA 130289
## 270 2014 Madera County, CA 2313
## 271 2014 Marin County, CA 2401
## 272 2014 Merced County, CA 4164
## 273 2014 Monterey County, CA 6455
## 274 2014 Napa County, CA 1475
## 275 2014 Orange County, CA 38595
## 276 2014 Placer County, CA 3631
## 277 2014 Riverside County, CA 30235
## 278 2014 Sacramento County, CA 19871
## 279 2014 San Bernardino County, CA 31226
## 280 2014 San Diego County, CA 44596
## 281 2014 San Francisco County, CA 9104
## 282 2014 San Joaquin County, CA 10113
## 283 2014 San Luis Obispo County, CA 2596
## 284 2014 San Mateo County, CA 9083
## 285 2014 Santa Barbara County, CA 5830
## 286 2014 Santa Clara County, CA 23742
## 287 2014 Santa Cruz County, CA 3069
## 288 2014 Shasta County, CA 2083
## 289 2014 Solano County, CA 5253
## 290 2014 Sonoma County, CA 5070
## 291 2014 Stanislaus County, CA 7511
## 292 2014 Tulare County, CA 7640
## 293 2014 Ventura County, CA 10468
## 294 2014 Yolo County, CA 2394
## 295 2014 Unidentified Counties, CA 10367
## 296 2014 502879
## 297 2015 Alameda County, CA 19434
## 298 2015 Butte County, CA 2442
## 299 2015 Contra Costa County, CA 12596
## 300 2015 El Dorado County, CA 1594
## 301 2015 Fresno County, CA 15359
## 302 2015 Humboldt County, CA 1441
## 303 2015 Imperial County, CA 3168
## 304 2015 Kern County, CA 13768
## 305 2015 Kings County, CA 2274
## 306 2015 Los Angeles County, CA 124641
## 307 2015 Madera County, CA 2225
## 308 2015 Marin County, CA 2288
## 309 2015 Merced County, CA 4104
## 310 2015 Monterey County, CA 6420
## 311 2015 Napa County, CA 1457
## 312 2015 Orange County, CA 37609
## 313 2015 Placer County, CA 3747
## 314 2015 Riverside County, CA 30491
## 315 2015 Sacramento County, CA 19423
## 316 2015 San Bernardino County, CA 30530
## 317 2015 San Diego County, CA 43942
## 318 2015 San Francisco County, CA 8972
## 319 2015 San Joaquin County, CA 9983
## 320 2015 San Luis Obispo County, CA 2668
## 321 2015 San Mateo County, CA 9037
## 322 2015 Santa Barbara County, CA 5673
## 323 2015 Santa Clara County, CA 23393
## 324 2015 Santa Cruz County, CA 2840
## 325 2015 Shasta County, CA 2073
## 326 2015 Solano County, CA 5131
## 327 2015 Sonoma County, CA 5015
## 328 2015 Stanislaus County, CA 7698
## 329 2015 Tulare County, CA 7411
## 330 2015 Ventura County, CA 10060
## 331 2015 Yolo County, CA 2402
## 332 2015 Unidentified Counties, CA 10439
## 333 2015 491748
## 334 2016 Alameda County, CA 19573
## 335 2016 Butte County, CA 2490
## 336 2016 Contra Costa County, CA 12340
## 337 2016 El Dorado County, CA 1601
## 338 2016 Fresno County, CA 15129
## 339 2016 Humboldt County, CA 1482
## 340 2016 Imperial County, CA 2939
## 341 2016 Kern County, CA 13728
## 342 2016 Kings County, CA 2248
## 343 2016 Los Angeles County, CA 123092
## 344 2016 Madera County, CA 2355
## 345 2016 Marin County, CA 2252
## 346 2016 Merced County, CA 4117
## 347 2016 Monterey County, CA 6219
## 348 2016 Napa County, CA 1406
## 349 2016 Orange County, CA 38106
## 350 2016 Placer County, CA 3732
## 351 2016 Riverside County, CA 30661
## 352 2016 Sacramento County, CA 19588
## 353 2016 San Bernardino County, CA 31032
## 354 2016 San Diego County, CA 42720
## 355 2016 San Francisco County, CA 9062
## 356 2016 San Joaquin County, CA 10268
## 357 2016 San Luis Obispo County, CA 2581
## 358 2016 San Mateo County, CA 8960
## 359 2016 Santa Barbara County, CA 5501
## 360 2016 Santa Clara County, CA 23042
## 361 2016 Santa Cruz County, CA 2799
## 362 2016 Shasta County, CA 2048
## 363 2016 Solano County, CA 5259
## 364 2016 Sonoma County, CA 4962
## 365 2016 Stanislaus County, CA 7862
## 366 2016 Tulare County, CA 7146
## 367 2016 Ventura County, CA 9592
## 368 2016 Yolo County, CA 2423
## 369 2016 Unidentified Counties, CA 10512
## 370 2016 488827
## 371 2017 Alameda County, CA 18888
## 372 2017 Butte County, CA 2386
## 373 2017 Contra Costa County, CA 12180
## 374 2017 El Dorado County, CA 1570
## 375 2017 Fresno County, CA 14541
## 376 2017 Humboldt County, CA 1372
## 377 2017 Imperial County, CA 2894
## 378 2017 Kern County, CA 13326
## 379 2017 Kings County, CA 2373
## 380 2017 Los Angeles County, CA 116950
## 381 2017 Madera County, CA 2120
## 382 2017 Marin County, CA 2237
## 383 2017 Merced County, CA 4202
## 384 2017 Monterey County, CA 5810
## 385 2017 Napa County, CA 1291
## 386 2017 Orange County, CA 37369
## 387 2017 Placer County, CA 3689
## 388 2017 Riverside County, CA 29857
## 389 2017 Sacramento County, CA 19202
## 390 2017 San Bernardino County, CA 29643
## 391 2017 San Diego County, CA 41230
## 392 2017 San Francisco County, CA 8947
## 393 2017 San Joaquin County, CA 9928
## 394 2017 San Luis Obispo County, CA 2550
## 395 2017 San Mateo County, CA 8585
## 396 2017 Santa Barbara County, CA 5531
## 397 2017 Santa Clara County, CA 22133
## 398 2017 Santa Cruz County, CA 2658
## 399 2017 Shasta County, CA 2008
## 400 2017 Solano County, CA 5131
## 401 2017 Sonoma County, CA 4642
## 402 2017 Stanislaus County, CA 7441
## 403 2017 Tulare County, CA 7130
## 404 2017 Ventura County, CA 9318
## 405 2017 Yolo County, CA 2272
## 406 2017 Unidentified Counties, CA 10254
## 407 2017 471658
## 408 2018 Alameda County, CA 18240
## 409 2018 Butte County, CA 2430
## 410 2018 Contra Costa County, CA 12002
## 411 2018 El Dorado County, CA 1674
## 412 2018 Fresno County, CA 14465
## 413 2018 Humboldt County, CA 1364
## 414 2018 Imperial County, CA 2629
## 415 2018 Kern County, CA 12916
## 416 2018 Kings County, CA 2262
## 417 2018 Los Angeles County, CA 110271
## 418 2018 Madera County, CA 2079
## 419 2018 Marin County, CA 2127
## 420 2018 Merced County, CA 3875
## 421 2018 Monterey County, CA 5895
## 422 2018 Napa County, CA 1204
## 423 2018 Orange County, CA 35679
## 424 2018 Placer County, CA 3663
## 425 2018 Riverside County, CA 28725
## 426 2018 Sacramento County, CA 19102
## 427 2018 San Bernardino County, CA 28994
## 428 2018 San Diego County, CA 40070
## 429 2018 San Francisco County, CA 8697
## 430 2018 San Joaquin County, CA 9841
## 431 2018 San Luis Obispo County, CA 2445
## 432 2018 San Mateo County, CA 8330
## 433 2018 Santa Barbara County, CA 5268
## 434 2018 Santa Clara County, CA 21292
## 435 2018 Santa Cruz County, CA 2449
## 436 2018 Shasta County, CA 1966
## 437 2018 Solano County, CA 5033
## 438 2018 Sonoma County, CA 4526
## 439 2018 Stanislaus County, CA 7364
## 440 2018 Tulare County, CA 6905
## 441 2018 Ventura County, CA 9065
## 442 2018 Yolo County, CA 2135
## 443 2018 Unidentified Counties, CA 9938
## 444 2018 454920
## 445 2019 Alameda County, CA 18212
## 446 2019 Butte County, CA 2154
## 447 2019 Contra Costa County, CA 11729
## 448 2019 El Dorado County, CA 1524
## 449 2019 Fresno County, CA 14057
## 450 2019 Humboldt County, CA 1417
## 451 2019 Imperial County, CA 2533
## 452 2019 Kern County, CA 12765
## 453 2019 Kings County, CA 2115
## 454 2019 Los Angeles County, CA 107231
## 455 2019 Madera County, CA 2045
## 456 2019 Marin County, CA 2071
## 457 2019 Merced County, CA 3806
## 458 2019 Monterey County, CA 5846
## 459 2019 Napa County, CA 1279
## 460 2019 Orange County, CA 35052
## 461 2019 Placer County, CA 3658
## 462 2019 Riverside County, CA 28026
## 463 2019 Sacramento County, CA 19089
## 464 2019 San Bernardino County, CA 28656
## 465 2019 San Diego County, CA 38540
## 466 2019 San Francisco County, CA 8438
## 467 2019 San Joaquin County, CA 10009
## 468 2019 San Luis Obispo County, CA 2447
## 469 2019 San Mateo County, CA 8206
## 470 2019 Santa Barbara County, CA 5537
## 471 2019 Santa Clara County, CA 21184
## 472 2019 Santa Cruz County, CA 2434
## 473 2019 Shasta County, CA 1903
## 474 2019 Solano County, CA 5065
## 475 2019 Sonoma County, CA 4395
## 476 2019 Stanislaus County, CA 7402
## 477 2019 Tulare County, CA 6714
## 478 2019 Ventura County, CA 8736
## 479 2019 Yolo County, CA 2057
## 480 2019 Unidentified Counties, CA 10147
## 481 2019 446479
## 482 NA 6512502
## 483 NA NA
## 484 NA NA
## 485 NA NA
## 486 NA NA
## 487 NA NA
## 488 NA NA
## 489 NA NA
## 490 NA NA
## 491 NA NA
## 492 NA NA
## 493 NA NA
## 494 NA NA
## 495 NA NA
## 496 NA NA
## 497 NA NA
## 498 NA NA
## 499 NA NA
## 500 NA NA
## 501 NA NA
## 502 NA NA
## 503 NA NA
## 504 NA NA
## 505 NA NA
## 506 NA NA
## 507 NA NA
## 508 NA NA
## 509 NA NA
## 510 NA NA
## 511 NA NA
## 512 NA NA
## 513 NA NA
## 514 NA NA
## 515 NA NA
## 516 NA NA
## 517 NA NA
## 518 NA NA
## 519 NA NA
## 520 NA NA
## 521 NA NA
## 522 NA NA
## 523 NA NA
#I noticed that the data I downloaded did not include total # of births so merging two datasets (one that has total # of birth counts and the other with low birth wegiht +very low birth weight counts)
df1 <- full_join(cdc_lowbirthweight , MCH.CDC.Data.Total, by=c("Year", "County"))
df1<- df1 %>% rename("cases" = "Births.x", "total_births" = "Births.y")
#Note: LBW = Low birth weight + Very low birth weight counts; Total Births = Total # of Birth
col_order <- c("Year", "County", "total_births",
"cases", "Average.Birth.Weight", "Standard.Deviation.for.Average.Birth.Weight",
"Average.Age.of.Mother", "Standard.Deviation.for.Average.Age.of.Mother","Average.LMP.Gestational.Age",
"Standard.Deviation.for.Average.LMP.Gestational.Age")
df2 <- df1[,col_order]
df2[df2$County == "Alameda County, CA", "County"] <-"alameda"
df2[df2$County == "Butte County, CA", "County"] <-"butte"
df2[df2$County == "Contra Costa County, CA", "County"] <-"contra costa"
df2[df2$County == "El Dorado County, CA", "County"] <-"el dorado"
df2[df2$County == "Fresno County, CA", "County"] <-"fresno"
df2[df2$County == "Humboldt County, CA", "County"] <-"humboldt"
df2[df2$County == "Imperial County, CA", "County"] <-"imperial"
df2[df2$County == "Kern County, CA", "County"] <-"kern"
df2[df2$County == "Kings County, CA", "County"] <-"kings"
df2[df2$County == "Los Angeles County, CA", "County"] <-"los angeles"
df2[df2$County == "Madera County, CA", "County"] <-"madera"
df2[df2$County == "Marin County, CA", "County"] <-"marin"
df2[df2$County == "Contra Costa County, CA", "County"] <-"mariposa"
df2[df2$County == "Merced County, CA", "County"] <-"merced"
df2[df2$County == "Monterey County, CA", "County"] <-"monterey"
df2[df2$County == "Napa County, CA", "County"] <-"napa"
df2[df2$County == "Orange County, CA", "County"] <-"orange"
df2[df2$County == "Placer County, CA", "County"] <-"placer"
df2[df2$County == "Riverside County, CA", "County"] <-"riverside"
df2[df2$County == "Sacramento County, CA", "County"] <-"sacramento"
df2[df2$County == "San Bernardino County, CA", "County"] <-"san bernardino"
df2[df2$County == "San Diego County, CA", "County"] <-"san diego"
df2[df2$County == "San Francisco County, CA", "County"] <-"san francisco"
df2[df2$County == "San Joaquin County, CA", "County"] <-"san joaquin"
df2[df2$County == "San Luis Obispo County, CA", "County"] <-"san luis obispo"
df2[df2$County == "San Mateo County, CA", "County"] <-"san mateo"
df2[df2$County == "Santa Barbara County, CA", "County"] <-"santa barbara"
df2[df2$County == "Santa Clara County, CA", "County"] <-"santa clara"
df2[df2$County == "Santa Cruz County, CA", "County"] <-"santa cruz"
df2[df2$County == "Shasta County, CA", "County"] <-"shasta"
df2[df2$County == "Solano County, CA", "County"] <-"solano"
df2[df2$County == "Sonoma County, CA", "County"] <-"sonoma"
df2[df2$County == "Stanislaus County, CA", "County"] <-"stanislaus"
df2[df2$County == "Tulare County, CA", "County"] <-"tulare"
df2[df2$County == "Ventura County, CA", "County"] <-"ventura"
df2[df2$County == "Yolo County, CA", "County"] <-"yolo"
df2 <- df2 %>% filter(!is.na(total_births)) %>% filter(!is.na(cases)) %>% mutate(rate = cases/total_births * 10^2)
df2$County <- df2$County %>% str_to_title()
This is my data wrangling process for preterm birth for the CDC WONDER database. By default, CDC WONDER live birth database only displayed counties that had a county population >100,000. I only looked at preterm birth here and this is for my shiny app bar graph.
cdc_pretermbirth <- read.delim("Preterm birth.txt", sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
cdc_pretermbirth <- cdc_pretermbirth [-c(422:472), ]
cdc_pretermbirth <- cdc_pretermbirth [ ,-c(1, 3, 5)]
cdc_pretermbirth <- cdc_pretermbirth %>% rename("Events" = "Births")
MCH.CDC.Data.Total <- read.delim("MCH CDC Data Total.txt", sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data.Total <- MCH.CDC.Data.Total[,-c(1, 3, 5)]
MCH.CDC.Data.Total <- MCH.CDC.Data.Total %>% rename("total_birth" = "Births")
df1_pt <- full_join(cdc_pretermbirth , MCH.CDC.Data.Total, by=c("Year", "County"))
df1_pt[df1_pt$County == "Alameda County, CA", "County"] <-"alameda"
df1_pt[df1_pt$County == "Butte County, CA", "County"] <-"butte"
df1_pt[df1_pt$County == "Contra Costa County, CA", "County"] <-"contra costa"
df1_pt[df1_pt$County == "El Dorado County, CA", "County"] <-"el dorado"
df1_pt[df1_pt$County == "Fresno County, CA", "County"] <-"fresno"
df1_pt[df1_pt$County == "Humboldt County, CA", "County"] <-"humboldt"
df1_pt[df1_pt$County == "Imperial County, CA", "County"] <-"imperial"
df1_pt[df1_pt$County == "Kern County, CA", "County"] <-"kern"
df1_pt[df1_pt$County == "Kings County, CA", "County"] <-"kings"
df1_pt[df1_pt$County == "Los Angeles County, CA", "County"] <-"los angeles"
df1_pt[df1_pt$County == "Madera County, CA", "County"] <-"madera"
df1_pt[df1_pt$County == "Marin County, CA", "County"] <-"marin"
df1_pt[df1_pt$County == "Contra Costa County, CA", "County"] <-"mariposa"
df1_pt[df1_pt$County == "Merced County, CA", "County"] <-"merced"
df1_pt[df1_pt$County == "Monterey County, CA", "County"] <-"monterey"
df1_pt[df1_pt$County == "Napa County, CA", "County"] <-"napa"
df1_pt[df1_pt$County == "Orange County, CA", "County"] <-"orange"
df1_pt[df1_pt$County == "Placer County, CA", "County"] <-"placer"
df1_pt[df1_pt$County == "Riverside County, CA", "County"] <-"riverside"
df1_pt[df1_pt$County == "Sacramento County, CA", "County"] <-"sacramento"
df1_pt[df1_pt$County == "San Bernardino County, CA", "County"] <-"san bernardino"
df1_pt[df1_pt$County == "San Diego County, CA", "County"] <-"san diego"
df1_pt[df1_pt$County == "San Francisco County, CA", "County"] <-"san francisco"
df1_pt[df1_pt$County == "San Joaquin County, CA", "County"] <-"san joaquin"
df1_pt[df1_pt$County == "San Luis Obispo County, CA", "County"] <-"san luis obispo"
df1_pt[df1_pt$County == "San Mateo County, CA", "County"] <-"san mateo"
df1_pt[df1_pt$County == "Santa Barbara County, CA", "County"] <-"santa barbara"
df1_pt[df1_pt$County == "Santa Clara County, CA", "County"] <-"santa clara"
df1_pt[df1_pt$County == "Santa Cruz County, CA", "County"] <-"santa cruz"
df1_pt[df1_pt$County == "Shasta County, CA", "County"] <-"shasta"
df1_pt[df1_pt$County == "Solano County, CA", "County"] <-"solano"
df1_pt[df1_pt$County == "Sonoma County, CA", "County"] <-"sonoma"
df1_pt[df1_pt$County == "Stanislaus County, CA", "County"] <-"stanislaus"
df1_pt[df1_pt$County == "Tulare County, CA", "County"] <-"tulare"
df1_pt[df1_pt$County == "Ventura County, CA", "County"] <-"ventura"
df1_pt[df1_pt$County == "Yolo County, CA", "County"] <-"yolo"
df1_pt <- df1_pt %>% mutate(County = str_to_title(County))
df1_pt <- df1_pt %>% filter(!is.na("total_birth")) %>% filter(!is.na(Events)) %>% mutate(rate = Events/total_birth * 10^2)
I then joined the CDC WONDER data (low birth weight and preterm birth) and Zainab’s wrangled pesticide data to come up with a joint data. I then generated bar graphs to visualize the trend across a span of 2007-2016 (Please see shiny app). We noticed that Fresno and Kern county were the two top counties that used the highest amounts of pesticide and found out that San Joaquin Valley is a region that’s agriculturally productive.
county_ranks16 <- read_delim("table1_county_rank_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## Parsed with column specification:
## cols(
## COUNTY = col_character(),
## LBS_2015 = col_double(),
## RANK_2015 = col_double(),
## LBS_2016 = col_double(),
## RANK_2016 = col_double()
## )
repro_lbs16 <- read_delim("table3_reproductive_lbs_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## Parsed with column specification:
## cols(
## CHEMICAL = col_character(),
## LBS_2007 = col_double(),
## LBS_2008 = col_double(),
## LBS_2009 = col_double(),
## LBS_2010 = col_double(),
## LBS_2011 = col_double(),
## LBS_2012 = col_double(),
## LBS_2013 = col_double(),
## LBS_2014 = col_double(),
## LBS_2015 = col_double(),
## LBS_2016 = col_double()
## )
repro_acre16 <- read_delim("table4_reproductive_acres_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## Parsed with column specification:
## cols(
## CHEMNAME = col_character(),
## ACRES_2007 = col_double(),
## ACRES_2008 = col_double(),
## ACRES_2009 = col_double(),
## ACRES_2010 = col_double(),
## ACRES_2011 = col_double(),
## ACRES_2012 = col_double(),
## ACRES_2013 = col_double(),
## ACRES_2014 = col_double(),
## ACRES_2015 = col_double(),
## ACRES_2016 = col_double()
## )
table1_2016 <- county_ranks16 %>% transmute(county = COUNTY,
lbs_2015 = LBS_2015, rank_2015 = RANK_2015,
lbs_2016 = LBS_2016, rank_2016 = RANK_2016)
all_dat <- list(read_csv("table1_2007.csv")[1:3],
read_csv("table1_2008.csv")[1:3],
read_csv("table1_2009.csv")[1:3],
read_csv("table1_2010.csv")[1:3],
read_csv("table1_2011.csv")[1:3],
read_csv("table1_2012.csv")[1:3],
read_csv("table1_2013.csv")[1:3],
read_csv("table1_2014.csv")[1:3],
read_csv("table1_2015.csv")[1:3])
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2006 = col_double(),
## rank_2006 = col_double(),
## lbs_2007 = col_double(),
## rank_2007 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2007 = col_double(),
## rank_2007 = col_double(),
## lbs_2008 = col_double(),
## rank_2008 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2008 = col_double(),
## rank_2008 = col_double(),
## lbs_2009 = col_double(),
## rank_2009 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2009 = col_double(),
## rank_2009 = col_double(),
## lbs_2010 = col_double(),
## rank_2010 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2010 = col_double(),
## rank_2010 = col_double(),
## lbs_2011 = col_double(),
## rank_2011 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2011 = col_double(),
## rank_2011 = col_double(),
## lbs_2012 = col_double(),
## rank_2012 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2012 = col_double(),
## rank_2012 = col_double(),
## lbs_2013 = col_double(),
## rank_2013 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2013 = col_double(),
## rank_2013 = col_double(),
## lbs_2014 = col_double(),
## rank_2014 = col_double()
## )
## Parsed with column specification:
## cols(
## county = col_character(),
## lbs_2014 = col_double(),
## rank_2014 = col_double(),
## lbs_2015 = col_double(),
## rank_2015 = col_double()
## )
table1 <- Reduce(function(x, y) left_join(x, y, by = "county"), all_dat)
long_table1 <- table1 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
table1_ranks <- long_table1 %>% filter(str_starts(usage, "rank"))
table1_lbs <- long_table1 %>% filter(str_starts(usage, "lbs"))
table1_lbs$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs$usage))
long_table2 <- table1_2016 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
table1_ranks_1516 <- long_table2 %>% filter(str_starts(usage, "rank"))
table1_lbs_1516<- long_table2 %>% filter(str_starts(usage, "lbs"))
table1_lbs_1516$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs_1516$usage))
combined_pesticide_use <- table1_lbs %>% full_join(table1_lbs_1516)
## Joining, by = c("county", "usage", "value")
combined_pesticide_use <- combined_pesticide_use %>% group_by(usage)
combined_pesticide_use <- combined_pesticide_use %>% arrange(usage)
averagebw <-df2 %>% select("County", "Year", "rate")
pesticide_averagebw_join <- averagebw %>% inner_join(combined_pesticide_use, by = c("County" = "county", "Year" = "usage"))
averagept <-df1_pt %>% select("County", "Year", "rate")
pesticide_averagept_join <- averagept %>% inner_join(combined_pesticide_use, by = c("County" = "county", "Year" = "usage"))
#bar graph of low birth weight
pesticide_averagebw_join %>% ggplot(aes(County, rate)) + geom_col() + ylab("Low Birth Weight Rate (%)") +xlab("") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1/2))
#bar graph of pesticide
pesticide_averagept_join %>% ggplot(aes(County, value)) + geom_col() + ylab("Pesticide Use (Pounds)") +xlab("") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1/2))
This is a similar spatial map but for pesticide use. Following is the wrangling process of generating a map that shapes the map of California, and merging that spatial object with pesticide use that I wrangled earlier to generate a leaflet map.
averagept_df <- combined_pesticide_use %>% filter(usage == "2016")
map <- readOGR(path.expand("cb_2018_us_county_20m.shp"),
layer = "cb_2018_us_county_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/lararostomian/Desktop/Harvard/Classes/BST 260/datascience-project/Data Prep (& Final RMD)/cb_2018_us_county_20m.shp", layer: "cb_2018_us_county_20m"
## with 3220 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
Statekey<-read.csv('./STATEFPtoSTATENAME_Key.csv', colClasses=c('character'))
map<-merge(x=map, y=Statekey, by="STATEFP", all=TRUE)
SingleState <- subset(map, map$STATENAME %in% c(
"California"
))
spatial_pesticide <-sp::merge(x=SingleState, y=averagept_df, by.x="NAME", by.y="county", by=x)
binpes <- c(200, 100145, 1131454, 3345277, Inf)
pal3 <- colorBin(
palette = "magma",
domain = spatial_pesticide$value, n=7, bins=binpes)
leaflet(spatial_pesticide, options = leafletOptions(zoomControl = TRUE, zoomLevelFixed = FALSE, dragging=TRUE, minZoom = 5.3, maxZoom = 9)) %>%
setView(lat = 36.778259, lng = -119.417931, zoom = 6) %>%
addPolygons(color = "Black", weight = 1, smoothFactor = 0.5,
opacity = 1.0, fillOpacity = 0.5, layerId = ~NAME,
fillColor = ~pal3(value),
popup = ~as.factor(paste0("<b><font size=\"4\"><center>County: </b>",spatial_pesticide$NAME,"</font></center>","Amounts of Pesticides used </b>", sprintf("%1.2f", spatial_pesticide$value),"<br/>"))) %>%
addLegend(pal = pal3, values = spatial_pesticide$value, opacity = 1, title="Amounts of Pesticide Used (Pounds)")
#Top 10 Counties in term of pesticide usage
agro <- c("Kern", "Tulare", "Fresno", "Monterey", "Merced", "Stanislaus",
"San Joaquin", "Ventura", "Madera", "Kings")
mch_regression <- MCH.CDC.Data_Race %>%
filter(Year == 2016) %>%
mutate(agricultural = ifelse(County %in% agro, 1, 0))
linmod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, mch_regression)
summary(linmod)[4]
## $coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5527.0554 870.44429 -6.349695 3.049477e-09
## Average.LMP.Gestational.Age 227.3201 22.48768 10.108650 3.118120e-18
summary(linmod)[9]
## $adj.r.squared
## [1] 0.4266074
mch_regression %>%
ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) +
geom_point() +
geom_line(aes(y = predict(linmod))) +
xlab("Average LMP Gestational Age (weeks)") +
ylab("Average Birth Weight (grams)") +
scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
ggtitle("Birth Weight Outcomes by Pesticide Usage") +
ylim(2980, 3520) +
xlim(37.6, 39.6)
Pesticide use did not appear to affect much, but race did.
avg_bw_mod2016 <- lm(Average.Birth.Weight ~ factor(Mothers.Race, ordered = F) + Average.LMP.Gestational.Age, mch_regression)
summary(avg_bw_mod2016)[4]
## $coefficients
## Estimate
## (Intercept) -3547.882869
## factor(Mothers.Race, ordered = F)Asian or Pacific Islander -109.533821
## factor(Mothers.Race, ordered = F)Black or African American -111.909013
## factor(Mothers.Race, ordered = F)White -7.310527
## Average.LMP.Gestational.Age 177.697257
## Std. Error
## (Intercept) 642.55388
## factor(Mothers.Race, ordered = F)Asian or Pacific Islander 12.34741
## factor(Mothers.Race, ordered = F)Black or African American 12.46519
## factor(Mothers.Race, ordered = F)White 12.51234
## Average.LMP.Gestational.Age 16.59650
## t value
## (Intercept) -5.5215337
## factor(Mothers.Race, ordered = F)Asian or Pacific Islander -8.8709968
## factor(Mothers.Race, ordered = F)Black or African American -8.9777224
## factor(Mothers.Race, ordered = F)White -0.5842655
## Average.LMP.Gestational.Age 10.7069102
## Pr(>|t|)
## (Intercept) 1.717557e-07
## factor(Mothers.Race, ordered = F)Asian or Pacific Islander 4.427547e-15
## factor(Mothers.Race, ordered = F)Black or African American 2.426946e-15
## factor(Mothers.Race, ordered = F)White 5.600389e-01
## Average.LMP.Gestational.Age 1.217500e-19
summary(avg_bw_mod2016)[9]
## $adj.r.squared
## [1] 0.7199346
#parallel lines, Black and Asian/Pacific Island populations fare the worst
mch_regression %>%
ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = Mothers.Race)) +
geom_point() + geom_line(aes(y = predict(avg_bw_mod2016))) +
xlab("Average LMP Gestational Age (weeks)") +
ylab("Average Birth Weight (grams)") +
scale_color_discrete(name = "Mother's Race") +
ggtitle("Birth Weight Outcomes by Race") +
ylim(2980, 3520) +
xlim(37.6, 39.6)
Pesticide use did not appear to affect much, but race did. So, we stratified by race.
#simple linear model more parsimonious than the one that has the interaction term for American Indian/Alaska Native Mothers
#tho there are low populations for this group, so I don't really trust any of the models
amerindian_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "American Indian or Alaska Native"))
summary(amerindian_mod)[4] #coefficients
## $coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3829.7135 1246.73649 -3.071791 4.495686e-03
## Average.LMP.Gestational.Age 184.9774 32.20391 5.743941 2.857815e-06
summary(amerindian_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.5078807
q1 <- mch_regression %>%
filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) +
geom_point() + geom_line(aes(y = predict(amerindian_mod)), size = 1) +
xlab("Average LMP Gestational Age (weeks)") +
ylab("Average Birth Weight (grams)") +
scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
ggtitle("Birth Weight Outcome for American Indian and Alaska Native Mothers") +
ylim(2980, 3520) +
xlim(37.6, 39.6)
#even slr doesn't explain a lot of the errors for Asian mothers
asian_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "Asian or Pacific Islander"))
summary(asian_mod)[4] #coefficients
## $coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -429.60128 1658.17961 -0.2590801 0.79718280
## Average.LMP.Gestational.Age 94.23017 42.87789 2.1976402 0.03509986
summary(asian_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.1012334
q2 <- mch_regression %>%
filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) +
geom_point() +
geom_line(aes(y = predict(asian_mod)), size = 1) +
xlab("Average LMP Gestational Age (weeks)") +
ylab("Average Birth Weight (grams)") +
scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
ggtitle("Birth Weight Outcome for Asian and Pacific Islander Mothers") +
ylim(2980, 3520) +
xlim(37.6, 39.6)
#simple linear model best for Black mothers
black_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "Black or African American"))
summary(black_mod)[4] #coefficients
## $coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4187.3941 1234.47274 -3.392051 1.816634e-03
## Average.LMP.Gestational.Age 191.3651 31.97847 5.984186 1.010634e-06
summary(black_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.5058892
q3 <- mch_regression %>%
filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) +
geom_point() + geom_line(aes(y = predict(black_mod)), size = 1) +
xlab("Average LMP Gestational Age (weeks)") +
ylab("Average Birth Weight (grams)") +
scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
ggtitle("Birth Weight Outcome for Black Mothers") +
ylim(2980, 3520) +
xlim(37.6, 39.6)
# Simple Linear Regression best shows relationship for White mothers
white_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "White"))
summary(white_mod)[4] #coefficients
## $coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4461.0894 1011.08034 -4.412201 1.030204e-04
## Average.LMP.Gestational.Age 201.0204 26.03101 7.722343 6.794002e-09
summary(white_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.6329664
q4 <- mch_regression %>%
filter(Mothers.Race == "White" ) %>%
ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) +
geom_point() + geom_line(aes(y = predict(white_mod)), size = 1) +
xlab("Average LMP Gestational Age (weeks)") +
ylab("Average Birth Weight (grams)") +
scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
ggtitle("Birth Weight Outcome for White Mothers") +
ylim(2980, 3520) +
xlim(37.6, 39.6)
q1
q2
q3
q4
#removed Imperial, the linear prediction improved a lot, and Imperial has a small population in general
asian_mod_no7 <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "Asian or Pacific Islander" & County != "Imperial"))
summary(asian_mod_no7)[4]
## $coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4485.823 1854.93330 -2.418320 0.0214593571
## Average.LMP.Gestational.Age 199.252 47.99022 4.151929 0.0002281163
summary(asian_mod_no7)[9]
## $adj.r.squared
## [1] 0.329793
mch_regression %>%
filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) +
geom_point() +
geom_line(aes(y = predict(asian_mod)), size = 1) +
xlab("Average LMP Gestational Age (weeks)") +
ylab("Average Birth Weight (grams)") +
scale_color_discrete(name = "Top Ten\nPesticide\nUse", labels = c('No', "Yes")) +
ggtitle("Birth Weight Outcome for Asian and Pacific Islander Mothers (no Imperial)") +
ylim(2980, 3520) +
xlim(37.6, 39.6)
plot(linmod)
hist(rstudent(linmod), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)
plot(avg_bw_mod2016)
hist(rstudent(avg_bw_mod2016), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)
plot(asian_mod)
hist(rstudent(asian_mod), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)
plot(asian_mod_no7)
hist(rstudent(asian_mod_no7), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)
plot(amerindian_mod)
hist(rstudent(amerindian_mod), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)
plot(black_mod)
hist(rstudent(black_mod), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)
plot(white_mod)
hist(rstudent(white_mod), probability = TRUE, main = "Histogram of Externally Studentized Residuals", col = "pink")
curve(dnorm,from=-4,to=4,add=TRUE)
Adding the top ranked pesticide counties do not improve any of the models.